Continuous Upper Confidence Trees

نویسندگان

  • Adrien Couëtoux
  • Jean-Baptiste Hoock
  • Nataliya Sokolovska
  • Olivier Teytaud
  • Nicolas Bonnard
چکیده

Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is in particular surprisingly efficient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Confidence Trees to continuous stochastic problems. We (i) show a deceptive problem on which the classical Upper Confidence Tree approach does not work, even with arbitrarily large computational power and with progressive widening (ii) propose an improvement, termed double-progressive widening, which takes care of the compromise between variance (we want infinitely many simulations for each action/state) and bias (we want sufficiently many nodes to avoid a bias by the first nodes) and which extends the classical progressive widening (iii) discuss its consistency and show experimentally that it performs well on the deceptive problem and on experimental benchmarks. We guess that the double-progressive widening trick can be used for other algorithms as well, as a general tool for ensuring a good bias/variance compromise in search algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states an...

متن کامل

An Upper Bound on the First Zagreb Index in Trees

In this paper we give sharp upper bounds on the Zagreb indices and characterize all trees achieving equality in these bounds. Also, we give lower bound on first Zagreb coindex of trees.

متن کامل

Outer and Inner Confidence Intervals Based on Extreme Order Statistics in a Proportional Hazard Model

Let Mi and Mi be the maximum and minimum of the ith sample from k independent sample with different sample sizes, respectively. Suppose that the survival distribution function of the ith sample is F ̄i = F ̄αi, where αi is known and positive constant. It is shown that how various exact non-parametric inferential proce- ′ dures can be developed on the basis of Mi’s and Mi ’s for distribution ...

متن کامل

Improving the Exploration in Upper Confidence Trees

In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to be...

متن کامل

On trees attaining an upper bound on the total domination number

‎A total dominating set of a graph $G$ is a set $D$ of vertices of $G$ such that every vertex of $G$ has a neighbor in $D$‎. ‎The total domination number of a graph $G$‎, ‎denoted by $gamma_t(G)$‎, ‎is~the minimum cardinality of a total dominating set of $G$‎. ‎Chellali and Haynes [Total and paired-domination numbers of a tree, AKCE International ournal of Graphs and Combinatorics 1 (2004)‎, ‎6...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011